

# Research Journal of Pharmaceutical, Biological and Chemical Sciences

# Survey on Performance of FFT Processor.

## V Vamsikrishna, T Surendra Babu, and N Mathan\*.

Department of ECE, Sathyabama University, Chennai, Tamil Nadu, India.

#### ABSTRACT

This paper describes about the different types of Fast Fourier Transform (FFT) processors which plays a vital role in Digital Signal Processing, Wireless Communications and Medical electronics. FFT is used everywhere like broadband to 3G and Digital TV to radio LAN's. Due to its intensive computational requirements, it occupies more space and consumes high energy in hardware. This paper gives an overview of the work done of different FFT processor previously.

Keywords: Fast Fourier Transform (FFT), Architecture, Pipelining, Multipath, Single path.



\*Corresponding author



#### INTRODUCTION

FFT processors are used in a large range of applications today. Not only a savory important block in broadband systems but also in digital TV like radar, medical electronics, imaging and the project Search for Extraterrestrial Intelligence. Most of these systems are real time application systems, which mean that the systems produce the result in a particular time. The working of FFT calculations are more and a better approach than a general purpose processor is necessary, to satisfy the requirements at a reasonable cost. The major concern for researchers is to fulfill real-time processing techniques and to reduce hardware complexity mainly with respect to power and to improve speed of processor. There are different types of FFT processors which would be discussed and some of the examples of them are Memory based architectures, Cache memory architectures, Array architectures, Pipelined architectures, Sequential, Parallel, Parallel Iterative, Array Architecture .

#### Literature Survey

Jaishri Katekhaye et al (2014) proposed REVIEW ON FFT PROCESSOR FOR OFDM SYSTEM [6]. The Fast Fourier transforming processor contains a butterfly processor unit, a RAM unit and a ROM unit for storing data, address generation unit and a control unit. The main parts of FFT processor are butterfly processor unit and address generation unit. The dual port RAM stores input data, intermediate results, output and ROM stores the Twiddle factor. The address generation unit generates the required address data for reading butterfly operations and also stores the output data final results in RAM. Control unit sequentially generates the control signals for each module.



Figure 1: Block Diagram of FFT Processor

**K. Umapathy and D. Rajaveerappa (2012)** proposed Low Power 128-Point Pipeline FFT Processor using Mixed Radix 4/2 [3]. The proposed FFT architecture combines the features of the SDF and MDC styles consisting of Module 1, Module 2, Module 3, and Module 4 including a conjugate, division and multiplexer blocks. The main features of the concluded FFT architecture are as follows. Firstly the 128/64-point FFT processor with 1–4 simultaneous data sequences can be operated and then secondly the proposed architecture can provide many outputs thus fulfilling the requirements of IEEE 802.11n standards. Thirdly small memory is required by employing the timed out feedback scheme to rearrange the input data and the intermediate results of individual module. The Scheduled approach and the particular constant multipliers can be employed to reduce the complexity of the multipliers. Hence the proposed FFT processor has less hardware complexion comparing with the traditional approach using multiple FFT processors. Finally a better radix FFT algorithm can be used to save energy dissipation with out any priority of the operations of either 64-point FFT or 128-point FFT. The input and output data sequence has the specified order in the proposed architecture. As the order of the input sequences as shown in Figure 2 is same as that of the data sequences from the ADC, no extra memory needed to change these input sequences before they are loaded into the FFT processor.

July – August 2016 RJPBCS 7(4) Page No. 86





Figure 2: Block Diagram of Receiver using IEEE 802.11n Standards showing the Data Sequences.

In general, the order of the output sequences differ from each input sequences in the FFT pipelined architecture. Generally the output sequences depend on the FFT algorithm, the number of data paths and the FFT architecture. In our design, the order of the output sequences is repeated with the order of input sequences as shown in Figure 2. The functioning of 128/64-point FFT/IFFT is controlled by the sequential control signal mode. If the operation of 64-point FFT is performed, then the data from Module 1 will skip to Module 2 and then to Module 3 directly as shown in fig 3. The function of Module 1 is to rearrange the input data order from the four data\information paths into a particular order to function the operation of FFT/IFFT with many data sequences in an efficient manner. Module 2 is to implement a radix-2 FFT algorithm corresponds to the 1st stage of SFG, as shown in Fig 2. Module 3 and 4 are meant for implementing three-step radix-8 FFT algorithm, corresponding to the 2nd and 3<sup>rd</sup> stages of the SFG, as displayed in Figure 2. Two different approaches (SDF and MDC) are followed in Module 3 and Module 4 for working of the three step radix-8 FFT algorithm in order to reduce the amount of memory and to ensure the correction of the FFT output data.



Figure 3: The Proposed 128-points FFT Architecture

**M. Garrido et al (2009)** proposed Parallel Architecture [5]. This is also known as In- Place architecture. It consists of a butterfly processor unit and also three multi port buffer units, one to parallelize the input data, one for processing data and other for the output. At the butterfly output a switching module branches the result to the right memory areas. The controlling of this architecture is complicated as there is lot of resource sharing. It is used in low to medium speed applications. The feature of this architecture is high throughput but worst hardware efficiency.

July – August 2016 RJPBCS 7(4) Page No. 87



**S.He and M.Torkelson (1996)** described that in Radix-2 single path delay feedback (R2SDF) architecture data is processed in a specified manner [8]. Each level of this architecture has a data path that goes through the multiplier. The number of elements processed and multipliers are used for radix-2 multi-path delay commutator architecture and this architecture is same, but the timed out elements required for this architecture is only N-1.

In a single-path delay feedback FFT processor many long and lengthy delay lines are used. Delay lines are designed using shift registers which is nothing but the cascading of D flip flops. For each clock edge, data movement is in a forward direction in a lock-step fashion for this half of the registers changes its states resulting in the loss of power. For achieving low power and low area a R2SDF architecture has been proposed with SRAM replacing the shift registers. To perform read and write operations in a single clock cycle a two port memory is required. But it can be replaced by dual single port SRAMs which can save space up to thirty three percent. The read operation is synthesized during the first half cycle and the write operation in the remaining half cycle. Current mode SRAMs are used which helps in power reduction. Two registers, one before and the other after PE are inserted to prevent the data access of the SRAM becoming critical paths. To access data from SRAM a ring counter is used in the place of address decoder and true-single-phase-clock flip flops are used in the ring counters to reduce power consumption.

**L.R. Rabiner and B. Gold (1975)** proposed Multipath Delay Commutator processes data in parallel [4]. In R2MDC dual parallel data paths are present and the data is processed parallel in butterfly units. The two data which are to be processed are fed correctly to the butterfly units with the help of timed out elements and multiplexer respectively. This architecture requires (log2N-2) multipliers, and 3N/2-2 delay elements.Radix-4 Multipath Delay Commutator is same as R2MDC but here radix-4 algorithm is used. In this architecture the components are utilized only upto quarter half percent and this is because they will be in active mode during the four cycles only. This architecture requires 3log4N complex multipliers and 5N/2-4 registers. The main disadvantages are cost of the hardware and low utilization. However, if the four inputs are fed at the same time, computational elements are utilized perfectly which inurn reduce memory requirements.

Energy optimized high performance FFT processor is designed in such a way that to overcome the disadvantages in conventional R4MDC architecture. In this architecture the input reordering requires only little amount of hardware. ROM is used for storing the twiddle factors. As the reordering of the given data needs only smaller buffers for reordering this architecture helps in reducing hardware cost as well as energy consumption to minor extent. This simplified architecture achieves improvement in memory consumption.

Y. W. Lin et al (2005) proposed MRMDF architecture [11]. It uses both radix-2 and 8 algorithms. The four parallel data paths provide higher throughput rate. There are two places where we need to rearrange the data and they are to reorder the input data and the results of each module. As single feedback is used in this, the memory consumption gets reduced. In this architecture scheduled scheme and designed constant multipliers are used in order to reduce the number of complex multipliers. This design helps in achieving output of 1Gsamples per second at the clock rate of only 250MHz.

**S. He and M. Torkelson (1998)** proposed the "Design of pipelined FFT processor for OFDM" [9]. In Multiple path timed out Commutator input is spitted into multiple parallel data by commutator and then, butterfly operation followed by twiddle factor multiplication with proper delays to each data stream. In radix-2 MDC (R2MDC), input data stream is divided into two parallel paths as shown in fig 7. for N equal to 16. Totally, (log2N – 1) complex multipliers, log2N butterfly units and (3N/2 - 2) delay buffers are required. All the butterfly units and multipliers can be utilized at cent percent with efficient input buffering. Fig. 8 shows the radix-4 MDC (R4MDC) architecture, in which four parallel streams are processed at once. A total of (3log4N - 1) complex multipliers, log4N radix-4 butterfly units and (5N/2 - 4) words of memory is required.





Figure 4: Radix-2 Multipath Delay Commutator FFT Architecture (N=16)



Figure 5 Radix-4 Multipath Delay Commutator FFT Architecture (N=16)

**S. He and M. Torkelson (1998)** proposed "Design of pipeline FFT processor for OFDM" [9]. In single path delay feedback architectures, a data sequence goes through multiplier in each level. The delay units are more efficiently utilized by sharing the same memory between the inputs & outputs. Radix-2 and 4 SDF architectures are shown in fig. 7 respectively. By careful observation of these architectures, it is proved that the utilization of multipliers and butterfly units are at half cent percent because they are bypassed for half of the time.



Figure 6: Radix-2 Single Path Delay Feedback FFT Architecture

### CONCLUSION

Different types of FFT processor and their implementations were discussed. As FFT plays an important role in many of the digital systems, designing of like that processor also plays a vital role. The designs of such architectures are also slightly complicated in general. The major part to be concentrated in FFT architectures is

July - August

2016

RJPBCS

7(4)



twiddle factor multiplication and when the multiplication can be completed with small memory requirement, less power consumption and with high accuracy then it will be the better suitable for today's applications.

#### REFERENCES

- [1] BG Jo, MH Sunwoo. IEEE Trans Circuits Syst I 2005;52: 911–919.
- [2] K Umapathy, D Rajaveerappa. International Journal of Soft Computing and Engineering (IJSCE) 2012;2:2231-2307
- [3] LR Rabiner and B Gold. Prentice-Hall, Inc., 1975.
- [4] M Garrido, J Grajal, MA Sanchez, and O Gustafsson. IEEE Trans VLSI System 2012.
- [5] Jaishri Katekhaye, Amit Lamba, Vipin. IJAICT 2014:1.
- [6] P Duhamel and H Hollmann. Electron Lett 1984;20: 14–16.
- [7] S He and M Torkelson. In Proc Int Parallel Processing Symp 1996:766-770.
- [8] S He and M Torkelson. In Proc of IEEE URSI International Symposium on Signals, Systems, and Electronics 1998: 257–262.
- [9] Santhiya V, Mathan N. ARPN Journal of Engineering and Applied Sciences 2015;10:4456-4461.
- [10] YW Lin, HY Liu, and CY Lee. IEEE J Solid-State Circuits 40:1726–1735.
- [11] Yang KJ, Chuang., IEEE Transactions on VLSI systems 2013;21: 720-731.
- [12] Yun-Nan Chang. IEEE Trans Circuits Syst II 2008;55: 1234 –1238

7(4)